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Abstract — We consider a dynamical linear network where 
nearest neighbours communicate via links whose states form bi- 
nary (open/closed) valued independent and identically distributed 
Markov processes. 

Our main result is the tight information-theoretic lower bound 
on the network traffic required by the connection state overhead, 
or the information required for all nodes to know their connected 
neighbourhood. 

These results, and especially their possible generalisations 
to more realistic network models, could give us valuable un- 
derstanding of the unavoidable protocol overheads in rapidly 
changing Ad hoc and sensor networks. 

Index Terms — Connection state overhead, dynamic linear net- 
work, exact series solution, entropy rate of an infinite dimensional 
hidden Markov process. 

I. Introduction 

IN a dynamical network it is essential to keep track of the 
connection state information in order to ensure efficient 
transmission of data. This requires additional information, in 
the form a connection state overhead, to be sent through 
the network. For networks with rapid dynamics (e.g. mobile 
networks) this overhead may be large, and it is therefore of 
relevance to find some quantitative measure of the required 
bandwidth. 

In this paper we study a simple model of a one-dimensional 
network introduced by Dey [I], in which the links form 
identical, independent and time-homogeneous discrete-time 
Markov processes in an open/closed-binary space. In this case 
the required connectivity information at a given node is simply 
the length of the path of open links in either direction. The 
ensuing connection state overhead is then quantified using 
information-theoretic methods. The relevant quantity is the 
smallest possible number of bits per second required for the 
connectivity overhead. Our main result is a sequence of upper 
and lower bounds converging exponentially to this quantity, as 
well as a simple and efficient method for their computation. 

To our knowledge [2] besides [I] is the only other work 
with the theme of quantifying the connection state overhead 
by information theory. 

The outline of the paper is as follows. In Section |ll] we in- 
troduce the network model and the connection state variables. 
The overhead is quantified in Section illll we also introduce 
a sequence of bounds for this quantity, derive an algorithm 
for their computation and show their exponential convergence 
towards the exact optimal overhead cost. 
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II. The model 

The one-dimensional network is composed of nodes and 
links connecting neighbouring nodes. The nodes are labelled 
using the spatial variable a; G Z. We choose x to increase to 
the right (see Figure [Q. The links are labelled using the index 
.T G Z such that the link x connects the nodes x and .t + 1. 
The dynamics of the network is described using a discretised 
time variable t £ N. The initial time is t ~ 1. 



-10 12 

Fig. 1. The linear network. Nodes and links are indexed as shown. 

The probability space il {0, l}^^^^ contains elements 
uj E n of the form u; = {cotx ■ t G N, a; e Z}. The state 
of a link x at time t is described by the random variable 
Xt{x) which is by definition equal to ujtx', "1" stands for up 
or open, and "0" for down or closed. We shall introduce a 
probability measure P of on the cr-field generated by the 
finite-dimensional cylindrical subsets of f2. 

All links X are assumed to have identical and independent 
statistics: P = (^^.gz p is a product over each x G Z. We now 
consider p, i.e. the time evolution of a single link x. Since 
all links x have identical statistics, we consider only the the 
link X ~ 1 and write Xt :~ Xt{l). The time evolution of 
X := {Xt : t eN} (and consequently of X{x) := {Xt{x) : 
t G N}) is given by an autonomous ' Markov process. Using 
the abbreviation 

p{b\a) P[Xt+i ^ b \ Xt ^ a] , 

where a, 6 G {0, 1}, the distribution of the Markov process X 
is determined by the transition matrix 

fp{l\l) p{l\0)\ fd u 

U(o|i) p{o\o)J ■ {d u 



T = 



(1) 



where u,d <E (0, 1) are the free parameters of the model, and 
A := 1 — A for any A G [0, 1]. Thus u (resp. d) is the probability 
that a closed (resp. open) link is opened (resp. closed) after 
one time step. 

The above Markov chain has a steady state probability 
distribution on {0, 1}. For b G {0, 1} we have 



p{b) := lim P[Xt = b\Xi=a], 

t—^OO 

^We use the term 'autonomous' as a synonym for 'time-homogeneous'. 
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regardless of the initial condition a e {0, 1}. From above we 
get 



C/:=p(l) = 



u 



u + d 
d 



(2a) 
(2b) 



Thus U (resp. D) is the steady state probability of a link being 
up (resp. down). 

For simplicity we assume that at time 1 all the link variables 
{Xi{x), X £ Zi} are distributed according to the stationary 
distribution. Autonomity of the links implies then that 

P[Xt{x) = 1] := f/, 
P[Xt{x)^0] := D, 



(3) 



for all a; G Z and < G N. Note that this restriction can always 
be lifted since all our results concern the limit t ^ oo. For any 
given initial distribution, conditions (|3} will hold with arbitrary 
accuracy for large enough times. 

These remarks define P uniquely. Figure |2| shows a space- 
time diagram of a typical evolution of the Unk variables. 



Fig. 2. A space-time view. Black links are open and gray links closed. 



A. Communications between the nodes 

We make the following assumptions about the communica- 
tion capabilities of the nodes. 

(i) A node x is able to send a one message to its left 
neighbour x ~ I and another (independent) message to 
its right member x + 1 at each time t via links x — 1 and 
X respectively. 

(ii) If the link x is up at time t, i.e., Xt{x) ~ 1, then the 
nodes x and a; + 1 can receive the messages they have 
(possibly) sent to each others at the previous time t — 1. 
If the link x is down at time t then these messages are 
lost. However, the nodes x and a; + 1 are able to observe 
that Xt{x) = in this case. 

(iii) If a node x receives a message at time t it may resend 
it immediately, i.e., the destination neighbour is able to 
receive the message at the time t + 1 provided the link 
between it and x is up at t. 

Distant nodes are able to communicate by using the nodes 
between them as relays. We assume that when a link is open 



it forms a communication channel that has some finite transfer 
capacity. This last fact is not used for any calculations but is 
stated here to make the subsequent considerations meaningful. 

B. The overhead messages 

In order to use efficient routing schemes it is important that 
a fresh connectivity status of each node is known at all times. 
Since the network is linear the relevant information is, for each 
node X, how far there exists an open path of links in both 
directions. Because of the finite data propagation speed this 
connection state information cannot be based on the current 
state of the network; rather, it is extracted from the newest 
available data at x on each link of the network. Since the 
network model is symmetric with respect to reflection about x 
and the states of the links on left and right of x are independent 
we may restrict ourselves to the right direction only. The 
quantity Mt{x) G No ^ expresses how many successive links 
are believed to be open on the right-hand side of node x at 
time t. A natural definition of Mt{x) in the light of the above 
remarks is then as follows. 

At the initial time t = 1 we set for all a; G Z 

Ml (a;) ■.= Xi{x). 

As time advances nodes transmit information to their neigh- 
bours according to the recursive scheme 



Mtix) := Xt(x) [Mt-i{x + !) + !]. 



(4) 



Therefore 

t m 

Mt{x) = 51 n Xt+i^k{x - 1 + fc) , (5) 

m=l fc=l 

which, by the independence of the links, has a stationary 
distribution 



lim V[Mt{x) = m] = DV 



(6) 



Note that (|3} implies that the equality in (|6} holds even without 
the limit whenever t > m. 

Because of translation symmetry, we restrict ourselves to 
the studying of the node x = I and abbreviate Mt := Mt{l). 
Then (|5} becomes 

t m 

Mt^Y^Ii Xt+i-k{k) . (7) 

Tn=l k=l 

A glance at Figure |3] shows that the value of Mt depends 
only on the link variables in the time-space-diagonal A(t) := 
{{s,x) : s = i + 1 - a; > l,x > 1 }. 
To simplify notation we re-index hnk states on the diagonals 

A(t), 

Xt+i_x{x) , x<t, 



Zt{x) := 



0, 



X > t. 



so that by Mt becomes a deterministic function of the 
infinite dimensional random vector Z_t '.= {Zt(l), Zt{2), ■ ■ ■). 
Similarly, we define re-indexed messages on A{t) by setting 

Mtix) := Mt+i-xix) , 

^We denote positive integers by N and write Nq = {0}UN for non-negative 
integers. 
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Fig. 3. The diagonal links A(t) contributing to Mt (shown in black). 

SO that Mt{^) = Ml = A/f(l) and the recursion relation (|4} 
simplifies to 

Mt{x) = Zt{x) [Mt{x + l) + 1] . (8) 

Note that the effect of the transformation of the variables 
{Xt{x)} I— > {Zt{x)} is equivalent to setting the information 
propagation speed to infinity, as can be seen by comparing the 
recursion relations @ and (|8j. We may also consider a more 
general network model in which each link x transmits with 
a certain (constant) speed 1/ j{x), j{x) = 0, 1, 2, 3, .... By a 
similar variable transformation we can map this model to the 
infinite speed model in the Zt{x) variables. Thus all following 
results are equally valid for such more general networks. The 
relevance of the value Mt for the prediction of the true length 
of the open path for data sent at time t depends on the 
parameters u,v (and of course j{x)). 

C. Entropies related to the link variables 

The entropy of a single link (say x = \, t = V) is 

H(Xi) = /i(f/) , 

where H( • ) is the entropy functional on random variables and 

/i(A) := -A logA- A logA, A e [0, 1] . (9) 
The entropy rate of the process X is by definition given by 

^(X) lim - mXt,....Xi). 

t-*oo t 

Using the chain rule for entropy and Markovity (see [3] for 
details) we may write 

J^f{X) = lim RiXt+i I Xt) . 

t — >oo 

Since we assumed that Xi is distributed according to the 
stationary distribution, we get 

= H(X2|Xi). 

This may be easily evaluated to give 

J^{X) = Uh{d)+Dh{u). 



In the following we shall also encounter Markov chains 
X(^) = t e N} defined by 

We therefore "skip" over j links at each time step. The 
corresponding transition probabilities are characterised by the 
two off-diagonal elements of , denoted by 

u, := P[Xt+j = 1 I = 0] , (10a) 
dj := P[Xt+j ^0\Xt = l]. (10b) 

Precisely as above, we find for the entropy rate of this 
process: 

.3^{X^^^)^Uh{dj) + Dh{uj), (11) 

where we used the fact that the stationary distribution of X^^' 
is the same as that of X. 

III. Overhead cost: entropy rate of the overhead 

MESSAGES 

We now quantify the optimal (i.e. smallest possible) cost 
of the connection state information overhead by the entropy 
rate^ of the stochastic process AI := {Mt, t e N}. This 
corresponds to the minimum amount of bits that need to 
be used on average to keep up to date on the number of 
consecutive up-links in the right direction from a fixed node 
X (for more details see for instance [3], [4]). The rate is 

J^{M) := lim -H(Mt, . . . , Mi) 

t^oo t 

= lim H(Aft|Mt_i,...,A'/i), (12) 

t — >oc 

where the second equality follows by applying the chain 
rule of entropy (note that both limits exist since M is an 
autonomous ergodic aperiodic process; see [3] for details). 

A. Bounds for the message entropy rate 

The evaluation of J12l i is tedious. A more practical approach 
is to compute lower and upper bounds that can be made as 
accurate as desired. Define for j G N 

% := lim H(Mt | M^.i, . . . , Mt_,+i) , (13) 

t — *oo 

Jfj :- lim H(Mt | Mt-i, Mt_,+i, Z*,,) . (14) 

t — ^oo 

It should not come as a surprise that '^j (resp. ^j) is an 
upper (resp. lower) bound for J^/f{M) that becomes arbitrarily 
accurate in the limit j — > oo. This is the content of the 
following. 

Lemma 3.1: The sequence {'^jj^gN is non-increasing and 
{^ }jgN is nondecreasing. Furthermore for all j G N we have 

< J^{M) < % . 

Finally, 

"^j-^j < C\l-u-dW 
for some constant C = C{u,v). 

^Note that this must still be multiphed by two to account for both right 
and left directions. 
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Proof: We omit the (easy) proof of monotonicity of the 
sequences as well as the fact that they are bounds for ^(A/) 
(see for instance Lemma 4.4.1 in [3]). The convergence of the 
bounds is postponed to Theorem 13.61 as it is easiest to prove 
using results from the following section. ■ 

B. A recursive scheme for the bounds 

In this section we derive the main result: A recursive 
algorithm for computing the bounds 'Wj and thus for 
approximating the exact entropy rate J^(M) to an arbitrary 
accuracy. 

For the proof it will be useful to rewrite the entropy by 
partitioning the probability space Vl. Let A C 17 be an event. 
Define H( • : A) as the entropy functional computed using 
the conditional probability measure P [ • \A\. For two random 
variables X, Y we have, for example, 



H(X \Y : A) 



,Y = y\A] \ogV[X = x\Y = y,A]. 



Lemma 3.2: If A lies in the cr-field a{X,Y) generated by 
{X,Y), then 



H(X|r) = R(Ia\Y)+P[A]R{X\Y : A) 
+ P[A'=]H(X |r : A'' 



(15) 



where Ia is the indicator function of the event A, and A'^ 
denotes the complement of the set A. 

Note that if A G cr(F) the first term of (tTst vanishes. 
Proof: Using the fact that Ia is a deterministic function of 
{X, Y) as well as the chain rule we have 

RiX\Y) = R{X,Y,Ia\Y) = KIXJaIY) 
= R{Ia\Y)+H{X\Ia,Y). 

The second term is equal to 

i6{0,l} x,y 

\ogP[X = x\Y = y,lA=i] 

= - J2 ^[^-4 = ^ P[X = a;, r = ?/ 1 Ia = ^] 
j6{o,i} 

\ogP[X = x\Y ^yjA^i] 
= P[A] H(X \Y : A)+ P[A'=] H(X | Y : A") . 



We now introduce two sequences that will play a key role 
in the following. For j e N define 

Pj := ^Ihn P [Mt > max{Mt_i, . . . , Mt-j}] ; 
we also set po := 1. Define furthermore the differences 

r] ■= Pj-i -Pj , 

for j E N. 

In order to avoid writing explicit limits in the following 
we introduce the equivalence relation ~ to denote asymptotic 
equality: a{t) ^ b{t) means limt^oo a{t) = limf^oo b{t). 



Theorem 3.3: The sequence of bounds ^j, can be 
computed recursively from 



El 

D 

Pi 
D 



R{Xi) - Jf{X^^'>) 



and 



^1 = -^^^(X(i)) , 



(16a) 
(16b) 

(17a) 

fA =-H(Xi). (17b) 
Note that the probabilities pj (or, equivalently, the dif- 
ferences rj) must still be computed; this is done in Ap- 
pendix |I] Everything else in the above expressions is known: 
.if was computed in and H(Xi) ^ h{U). 

A direct consequence of the theorem is an expression for 
the exact entropy rate: From ( I17a> and (I16a> we get 



(18) 



Proof: [Theorem 13.31 We first introduce some notation. 
Define the vector 



(19) 



(Mt_i,...,Mt_j) 
and the cxo-norm | • | defined by 

(mi, . . . , mj)\ :~ max{mi, . . . , rrij} . 

The key idea of the proof is to partition the probability space 
= AU A", where 

A := {Mt > |Mii\| }, 

and use Lemma 13.21 Some of the rigorous proofs of the 
intuitively plausible steps (a-f) are postponed to Lemma 13741 
We have 

if,+i ^H(Aft|M|i\,Zt_,_i) 

+ P[A]H(Mt|M^i\,Zt_,_i : A) 
+ P[A^]H(Mt |M[i\,Zt_,_i : A') 

'^ii{lA\M^2,,Zt^,-i) 

+ P[A]H(Mt |M^i\,Zt_,_i : A) 
+ P[A'']H{Mt\M[ti^\Zt-j : A"") 



H(7a 

P[A] 



M 



rU) 



H(Mt|MrJi,^t-j-i : A) 



A) 



(d) 



if, + + P[A] \H{Mt I Zt-j-i) - il{Mt I Zt_j 



where (a) follows from Lemma 13.21 (b) from Lemma 13.41 
(i); (c) from Lemma 13.21 applied to X = Mt and Y = 
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iM^^i^KZt-j); (d) from Lemmall^(ii).fiv).fv): and (f) from 
Lemma 13.51 
Similarly, 















+ P[^]H(Aft 1 M|i\ : 


A) 






+ P[^'=]H(Aft M|i\ 


: A") 


^' H(/^ 1 




+ P[^]H(Mt|M^i\ : 


A) 






+ P[A'=]H(Mt Mj^i^ 


'At -J 











P[^] 



H(Mt I M|i\ : A) 



rb-i) 



(d) 



(f) 



Jf,+0 + P[^] 



where (a) follows from Lemma IT^ (b) from Lemma IT?l (i): 
(c) from Lemma ll!2l (d) from Lemma 13741 (ii).(iii).(v): and (f) 
from Lemma 1331 

The initial values Ml\ follow from Lemma 1331 ■ 

Lemma 3.4: Using the notation of the proof of Theorem 
13.31 we have 



(i) H(A/t|M|i\ : A") = R{Mt\M}t2i,Zt-j^i : A" 



H(A/t|M,(i~'\Zt_, : A'), 



(ii) H(/^ I M|i\) = R{Ia I Mii\,^t-,-i) 
= R{lA\Mt,'\Zt.,) 



(iii) 


H(A//t 


M|i\ : A) ^ 


H(A/0 , 


(iv) 


H(Aft 


M^A,Zt-,-i 


: A) - H(A/t|Z,_,_i), 


(V) 




M^r'\zt-, 


: A) ^ H(A/t|^t-j). 



The proof of Lemma 13.41 is banished to Appendix [H] To 
complete the proof of Theorem l3.3l we still need the first order 
bounds ifi, 'Wi. 

Lemma 3.5: For any j e N we have 



lim H(Aft|Z,_,) = 

t—>-00 D 



(20) 



In particular, = ,J^{X)/D. Furthermore, 



lim R{Mt) = -H(Xi), 

t—>oo J_J 



so that = RiXi)/D. 

Prooj^ By the recursion relation (|8} we have Aft = 
Zt(l) • [A/t(2) + 1] . Since A/t(2) > we have by bijectivity 



and the chain rule 

H(Aft|Zt_,) 

= H(Zt(l),Zt(l)-[Mt(2) + l]|Zt_,) 

= H(Zt(l) I + H(Zt(l) • [ A/t(2) + 1 ] I Zt{l),Zt^,) 

= + P[Zt(l) = l] 



• H(Zt(l)-[Mt(2) + l] 
+ P[^t(l) = 0] 

• H(Zt(l)-[Mt(2) + l] 

+C/H(Mt(2)|Zt_, 
^(x(^■)) +C/H(A7t(2)|Zt_, 



Zt_, : {Zt{l) = 1} 

Zt-, : {Zt{l) = 0} 
- 1}) 



where the last step follows from the fact A/t(2) is independent 
of Zt{l). Using translation invariance we therefore get 



lim Ji{Mt\Zt-j] 

t — ^oo 

and ilQi follows. 

Furthermore from P[A/f = m 



U lim niMt\Zt-j), 

t — !-00 



J7™ D we get 
lim H(A/t) = - y log (J7" £») 



h{U) 



H(Xi) 



D D ' 

where the function h is defined in (|9jl. ■ 

C. Convergence of the bounds 

We now address the convergence of the bounds, thus com- 
pleting the proof of Lemma 13.11 

Theorem 3.6: For any u, v G (0, 1) there exists a constant 

C = C{u, d) < oo such that 

^-^j < C\l-u-d\^. 
Proof: We start with three auxiliary results. 
First, we notice that the eigenvalues of the single link 
transition matrix T in Q are 1,1 — u—d, and since \ l—u—d\ < 
1 the limit 



lim 



U U 
D D 



(21) 



exists. (This is just a restatement that the X has a unique 
stationary distribution.) The convergence is exponentially fast, 
i.e., 

\\T^ < ki\l-u~d\^ , (22) 

where || • || is a matrix norm and fci = ki{u,v) is some finite 
constant. 

Second, the smooth function g on 2 x 2 matrices (0, 1)'^^^, 
defined by 

g{A) := - DAnlogAn- DA2i\ogA2i 
- U Ai2 log ^12 - U A22 logyl22 , 

is Lipschitz continuous on closed subdomains. In particular, 
for all A, A' e B^{T^) holds 



\g{A)-g{A')\ < k2\\A-A'\\, 



(23) 
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provided that e > is small enough that the closure of the 
ball BeiT.,) = {A e R2X2 . ||^ - r*|| < e} is contained in 
(0,1)^^^, and the finite constant ^2 = k2{u,v,e) is large 
enough. 

Third, by a direct calculation we see that g satisfies 



g{T^) and H(Xi) 



.9(T*) 



Therefore, by expressing the difference of the recursion 
relations (I16b> and (I16a> with these identities and using the 
trivial bound pj < 1, we get 



< 



1 

D 



(24) 
can be 



If j is large enough the estimates (I23> and 
combined to yield 

g{T^)~g{n) < fc2||T^ -r*|| < fcifcajl 

which together with (I24> completes the proof. ■ 
Finally some remarks about convergence. From the theorem 
it is clear that if it + d « 1 the convergence is fast. Indeed, 
if w + d = 1 the first order terms = are exact. This 
can also be seen directly: We have u ^ U, d ~ D, so that 
T = T^ =n and therefore jr{X) = = H(Xi). 

On the other hand, the convergence becomes slower if w, d w 
or « 1. The limiting case u = d = corresponds to a 
static network and u = d = 1 is physically meaningless, which 
is also why we excluded both cases from our discussion. 

IV. Conclusion 

In a dynamic network information about connectivity must 
be sent through the network regularly. This connection state 
overhead consumes the available bandwidth of the network. 
It is therefore natural to ask what is the smallest possible (in 
the context of information theory) bandwidth required for the 
connection state overhead. In this work we provide the answer 
in the special case of a simple linear network model: As a main 
result we have presented an exact and rapidly converging series 
expression for the best achievable overhead data rate. 

We have only considered a linear network model. However, 
the results derived here are also applicable to the case of a tree 
with the connectivity information at each node being whether 
or not it is connected to the root, since this model is fully 
equivalent to the one-dimensional network. 

The generalisation of our results to linear networks with 
more general links that have a larger state space is probably 
possible by using the same or very similar techniques as 
here. However, the most interesting generalisations, such as 
more complex network topologies, seem to pose a far greater 
challenge. 

Appendix I 
An effective algorithm for computing rj 

A "brute force" computation of pj is too complex to be 
of any practical use if j > 2. We present here a more 
convenient method. The result is a simple recursive algorithm 



for calculating rj. The probabilities pj can then be computed 
from 

Pj = 1 - J-i rj. 

For j G N we have 



-Pj-i - Pj 



P^Mt > max{Mt_i, 

OO 

J2 P[A/t = m]P 



't~j > m, Mt-j+i < m, ■ 



m=0 



, Mf_l < TO 



Mt ^ TO 



(25) 



Define the new random variable 



,(m) 



n ^*(^)' 



so that 



{Zi™^ =0} = {Mt<m]. 
Then we get from above 

OO 

l,Zt(TO + l) = o] P[A/t =to] 



m=0 



^(m) 



m=0 



Tjrn 
3 ^ ' 



where we have used (|6} and rj™^ is the limit 



lim P 

t — ^oo 



(m) 



z^'I^l = 



(26) 



1 



The above discussion is meaningless if to = 0; from yS] 
however, we see that we must define 




for (|26j to hold. 

Define now for £ N 

Am) ._ 



= limP|z(™^ = l|zr' = ll. 

For the following we note that the process obtained from 
X by reversing the time is also a Markov process with 
transition probabilities identical'* to those of X; for example 

P[Xt-i = 1 1 = 0] = M. Thus 

(m) -jin 
q) ' = ■ 

The recursion relation for r$'"' arises as follows. We rewrite 



,(m) _ 



qj™' by decomposing the event {Z^Z^j = = l}' 



(m) 
j 



Am.) 



^We use here the fact that the hnks are distributed according to the 
stationary distribution at all times. 
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successively conditioning on the values of , i = 1, . . . , j, 
we get 



Lemma 2.1: Let X, Y be random variables, (j) a function on 
the range of Y, and suppose that, for all x, y, 



^ 1, 4™' 1} = ^ = 1, z'™' = 1, P[x = .T I r = y] = p [X = X I y e {^{y))\ 



(m) 

t-i+1 



z'f'H = 0, = 1 } , 



where the sum means a union of disjoint events. Taking the 
probability measure of both sides and using Markovity^ of the 
time-reversed X process we have 



which gives 



' 3 ~ "j 



i-1 



.(™) 



(27) 



This is the desired recursion relation expressing r^™^ as 
a function of r J*" r|™j. Using r^™"* = we may 



therefore find r 



(m) 



We summarise the results: 

Lemma 1.1: The quantity r^, j £ N, may be computed from 



(m) 



where r^™'', m G No, satisfies the recursion relation 



— rn V ^ — m 

i=l 



„(™) 



As an example, we compute r\, and r-^: 

dY Tu"^ ^ D 

„ l-dl 



r2 y [d'; 



lU 



OO 



rn— 



= D 



1 



?'3 



.l-d2C/ 1-d U 

OO 



m— 



— ni — ni -;3m\ ^.^ 

2 +d lU"^' 



D 



l-d^U l-dd2U i-tu 

Appendix II 
Proof of Lemma UII 

The proof involves deriving equalities for conditional prob- 
abilities. These then induce equalities of the conditional en- 
tropies according to the following lemma. 



'Note that if 
known (to equal 1). 



(m) 



1 then all of the relevant first m linlcs of Zt are 



Then 

U{X\Y) = H(X|0(y)). 
Proof: The proof is based on writing out the definition of 
the conditional entropy H(X | Y), rewriting the sum y(') 
s 12y ■ 4>{y)=s ( ' ) ^'^'^ using the assumption. We omit 
further details. ■ 
Proof: [Lemma [3 .41 Let us begin with (i). The condition- 
ing event is 

= {M,<\Mi_,\}. 

Let m G N-' and define 

i{rn) := min{fc G {1, . . . , j} : mt-k = |22l|} • 

Let furthermore G {0,1}^ be chosen so that ip{z') = 

mt_j and ip{z^') = ^nt-i{m)^ where Lp is a deterministic 
function that gives Alt as a function of Z_t. Then we have, 
for m G N and z G {0, 1}"^, 



Mf. 



M 



z, Mt < \rn\ 



(a) 



Ztil) = ■■■ = Zt{m) = 1, Zt{m + 1) = 



M 



■t-i 



m,Z_t-j-i =z,Mt < |m|, 



(1) 



1, 



(|27l| + l) = 



(b) 



Zt{l) = ■ • • = Zt{m) = 1, Zt{m + 1) = 



M 



= m, Mt < I221I, Zt_j(m)(l) = • • • 

Zt-i{rn)W = ^,Zt-i{m){\m.\ + 1) = 



Mt = m 



Mli'i = IR, Mt < \m\ 



<g p 



(4) p 



Zt{l) = ■■■ = Zt{m) = 1, Zt{m + 1) = 



M|i\ =m,Mt < \rnlZt^^{rn) 



Zt{l) = ■■■ = Zt{m) = 1, Zt{m + 1) = 



(e) 



Mt = m 
Mt = m 



TO . Mt < |to|. Zt_, 



i-J 



MY_!^=m,Mt < \rn\,Zt-, = 



0) 



where rn^^~^'> denotes the j — 1 first components of to; (a) 
follows from rewriting the conditions Mt = to, Mt-i{m) = 
|?2i|; (b) from Markovity, independence and the fact that 
™ < lzii|; (c) from independence and to < |to|; (d) from 
Markovity; and (e) from independence and to < |to|. Then the 
assertion follows from Lemma ITTI bv choosing the functions 
0i(to(J'-i',z',z) ~ {mS^~^\^{z!),z), (l)2{m^^~^\z',z) := 
(to(^-i),(^(z')), and <l)3{m^^-^\z' ,z) := {m^J-^\z'). 
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To prove (ii) choose m, z_ and z' as above and write 

= p[^t(l) = 



M. 



t-1 

p[^*(l): 

fU) 



■■■ = Zt{\m\ + l} = l 

■■■ = Zt{\m\ + l)^l\M 
■■■ = Zt{\m\ + l)^l 



■t-i 



MY^,=m,Zt.,=z!\ 
= P[Mt > \m\ |Mii~i^^ =m!-^^^\Zt-j =z'], 

where (a) follows from rewriting Mt > |m| ; (b) and (c) from 
independence and Markovity (the full details are exactly as 
above using the index variable i{m)). 

The proofs of (iii), (iv) and (v) are almost identical; we only 
show (iv). Let m, m and z be as above. First note that under 
m and Mt > \rn\ there is a bijective 



the conditions M 
map between Mt and Mtdrnj + 2): 

Mt = Mt{\m\+2) + H + 1, 

so that 

U{Mt I Zt-,-i : {Mii\ - m, M* > |m|}) (29) 
= H(Mt(H +2) I : {M|i\ =rn,Mt > \m\}) . 

Now 



(a) 



(b) 



Mt{\m\ +2)^m 
Mt{\rn\ + 2) = TO 
■ ■ ■ = Zti\rn\ + 1) 
Mt{\m\ + 2) = 77i 



Kt-i ^rn,Zt-j-i =z,Mt > \rn\ 

M?-i - m,Zt-j-i - z, Zt{i) = 
= 1 

Zt-j^i = z 



where (a) follows from rewriting the condition Mt > \rn\, and 
(b) from independence. Now by Lemma IZTI and ( I29> we get 

H(Mt I : {M|i\ = TO,Aft > |m|}) 

= H(M4(|m| + 2)|Zt_,_i) ^ H(A/t|Z,_,_i), 

where the last step follows from translation invariance. There- 
fore 



U Mt M 



.t-\->iLt-3-\ 



:A) 



P[Mii\=TO|A] 
- ■ H(Mt : {M'f\=m,Mt > |to|}) 

H(Mt I Zt^j-i) P [m}A =m\A] 

ra 

H(Mt|Zt_j_i) . 
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